Skip to content

Run auto-tagging only if no documents found by OCR#918

Open
SCDevel wants to merge 1 commit into
icereed:mainfrom
SCDevel:patch-1
Open

Run auto-tagging only if no documents found by OCR#918
SCDevel wants to merge 1 commit into
icereed:mainfrom
SCDevel:patch-1

Conversation

@SCDevel

@SCDevel SCDevel commented Mar 5, 2026

Copy link
Copy Markdown

ISSUE

Currently if someone AUTO_TAG's and AUTO_OCR_TAG's some documents and then waits a short time (ex. 15 seconds) then TAGS some more documents there is potential that the second batch of documents will not end up being OCR'd until after tagging is complete. This would be fine if LLM startup times were not potentially really slow. (I initially started hosting the Models on a HHD, which could take minutes to startup)

SOLUTION

If OCR has processed documents, skip Tagging and check OCR again. This will recheck for new documents,
Which in theory will keep the OCR Model alive, preventing the slow starting and stopping behavior.
Also this will help with situations where users are mass uploading files with a workflow to set the auto tag's on consumption (or using the folder tags feature).

FLAW

If the document is processed fast enough it could potentially miss this second check.
maybe a delay could help. it would probably still be faster even with a 10-20s delay.

TESTING

I have not tested this, as it is a relatively small feature. I also don't honestly know how the Models are started and stopped so this potentially may not even fundamentally work, but I figured regular maintainers would know that better then me.

Summary by CodeRabbit

  • Bug Fixes
    • Optimized document processing by preventing redundant auto-tagging when OCR has already successfully processed documents. Auto-tagging now runs conditionally to improve system efficiency.

This is to help prevent a task-switching overhead that comes from the potentially long startup times of models.
@coderabbitai

coderabbitai Bot commented Mar 5, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

The auto-tagging step in background.go now executes conditionally—only when OCR produces no documents. Previously, auto-tagging always ran after OCR regardless of its results. This optimization prevents redundant processing and improves efficiency by skipping unnecessary auto-tagging operations.

Changes

Cohort / File(s) Summary
Background Processing Control Flow
background.go
Made auto-tagging conditional on OCR results; auto-tagging only executes if OCR returns zero documents, avoiding redundant processing. Control flow now checks OCR document count before proceeding to auto-tagging phase.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 When OCR completes its hoppy scan,
No need for tagging—skip the plan!
Smart conditionals save the day,
Processing cleaner, the rabbit way! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main control flow change: auto-tagging now only runs when OCR produces no documents, avoiding redundant processing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant